{smcl}
{* 14jan2022}{...}
{cmd:help mata mm_quantile()}
{hline}

{title:Title}

{p 4 4 2}
{bf:mm_quantile() -- Empirical quantile function}


{title:Syntax}

{p 8 23 2}
{it:real matrix}{bind:    }
{cmd:mm_quantile(}{it:X} [{cmd:,} {it:w}{cmd:,} {it:P}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}]{cmd:)}

{p 8 23 2}
{it:real rowvector}{bind: }
{cmd:mm_median(}{it:X} [{cmd:,} {it:w}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}]{cmd:)}

{p 8 23 2}
{it:real rowvector}{bind: }
{cmd:mm_iqrange(}{it:X} [{cmd:,} {it:w}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}]{cmd:)}

{p 8 23 2}
{it:real colvector}{bind: }
{cmd:_mm_quantile(}{it:x} [{cmd:,} {it:w}{cmd:,} {it:P}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}]{cmd:)}

{p 8 23 2}
{it:real scalar}{bind:    }
{cmd:_mm_median(}{it:x} [{cmd:,} {it:w}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}]{cmd:)}

{p 8 23 2}
{it:real scalar}{bind:    }
{cmd:_mm_iqrange(}{it:x} [{cmd:,} {it:w}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}]{cmd:)}

{p 4 8 2}
where

{p 14 18 2}{it:X}:  {it:real matrix} containing data (rows are observations, columns variables)

{p 14 18 2}{it:x}:  {it:real colvector} containing data (single variable)

{p 14 18 2}{it:w}:  {it:real colvector} containing weights

{p 14 18 2}{it:P}:  {it:real matrix} containing evaluation probabilities; default is {it:P} = (0, .25, .5, .75, 1)'

{p 12 18 2}{it:def}:  {it:real scalar} selecting quantile definition; {it:def} in {c -(}0,...,11{c )-}; default is {it:def} = 2

{p 13 18 2}{it:fw}:  {it:real scalar} causing weights to be interpreted as frequency weights

{p 13 18 2}{it:wd}:  {it:real scalar} setting width of evaluation interval for trimmed Harrell-Davis
estimator; argument {it:wd} is only relevant with {it:def}=10; 
{it:wd}<=0 sets the width to 1/sqrt(n), where n is the effective sample size; 
0<{it:wd}<1 sets the width to {it:wd}; {it:wd}>=1 uses the untrimmed estimator;
the untrimmed estimator is also used if {it:wd} is omitted or set to missing


{title:Description}

{pstd}
{cmd:mm_quantile()} applies to {it:P} the inverse
empirical cumulative distribution function of {it:X} (the
so called quantile function). That is, {cmd:mm_quantile()}
returns the quantiles of {it:X} corresponding to the
probabilities provided by {it:P}. For example,

{p 8 8 2}
{cmd:mm_quantile(x, 1, 0.25)}

{pstd}
returns the first quartile (i.e. the 0.25-quantile) of {cmd:x}. Missing values
in {it:X} are not allowed.

{pstd}
{cmd:mm_quantile()} works column by column. If
{it:P} has one column and {it:X} has several columns, then the
quantiles corresponding to {it:P} are computed for each column of
{it:X}. If {it:X} has one column and {it:P} has several columns, then the
quantile function of {it:X} is applied to each column of {it:P}. If
{it:X} and {it:P} both have several columns, then the number of
columns is required to be the same and quantiles are
computed column by column.

{pstd}
Argument {it:w} specifies weights associated with the observations (rows) in
{it:X}. Omit {it:w}, or specify {it:w} as 1 to obtain unweighted results. Missing
values or negative values in {it:w} are not allowed.

{pstd}
Argument {it:def} selects the quantile definition to be used. Let Q(p) denote
the p-quantile of X. N is the sample size (number of observations), X_(j)
is the j-th observation from sorted X, w_j is the weight associated with
observation j (equal to 1 if weights are omitted), and W is the total sum of
weights. The quantile definitions then are as follows:

{phang2}
    {it:def}=0: The "high" quantile defined as Q(p) = X_(j) with j selected
    such that W_(j-1) <= p*W < W_(j), where W_(j) is running sum of weights
    across the sorted data up to and including observation j (sorting of ties
    does not matter).

{phang2}
    {it:def}=1: Inverse of the empirical cumulative distribution function
    (ECDF), or "low" quantile, defined as Q(p) = X_(j) with j selected such
    that W_(j-1) < p*W <= W_(j).

{phang2}
    {it:def}=2: Like definition 1 but using averages where the ECDF is flat.
    Again, select j set such that W_(j-1) < p*W <= W_(j). Then Q(p) = X_(j) if
    W_(j-1) < p*W < W_(j) and Q(p) = (X_(j) + X_(j+1))/2 if p*W = W_(j).

{phang2}
    {it:def}=3: Nearest order statistic. Again, select j set such that W_(j-1)
    < p*W <= W_(j). Then Q(p) = X_(j-1) if p*W is closer to W_(j-1) and Q(p) =
    X_(j) if p*W is closer to W_(j). If p*W lies exactly in the middle between
    W_(j-1) and W_(j), then Q(p) = X_(j) if j is even and Q(p) = X_(j-1) if j
    is odd. For definition 3, the sorting of ties in X matters if {it:fw}=0 and
    the weights are not constant; to obtain stable results, {cmd:mm_quantile()}
    sorts the data in ascending order of the weights within ties of X. If
    weights are constant or if {it:fw}!=0, the sort order of ties does not
    matter.

{phang2}
    {it:def}=4: Linear interpolation of the EDCF. Let h = p*n where n is the
    "effective" sample size defined as n = W^2 / sum(w_j^2) (or as n = W if
    {it:fw}!=0). Q(p) is then computed by integration of the empirical
    distribution of X between p0 = (h-1)/n and p1 = h/n. In the unweighted case
    (or if all weights are equal), this boils down to Q(p) = X_(l) +
    (h-l)*(X_(l+1) - X_l) with n = N and l = foor(h).

{phang2}
    {it:def}=5: Like {it:def}=4, but with h = p*n + 0.5.

{phang2}
    {it:def}=6: Like {it:def}=4, but with h = p*(n + 1).

{phang2}
    {it:def}=7: Like {it:def}=4, but with h = p*(n - 1) + 1.

{phang2}
    {it:def}=8: Like {it:def}=4, but with h = p*(n + 1/3) + 1/3.

{phang2}
    {it:def}=9: Like {it:def}=4, but with h = p*(n + 1/4) + 3/8.

{phang2}
    {it:def}=10: Harrell-Davis quantile estimator based on integration of the
    Beta function with shape parameters p*(n+1) and (1-p)(n+1). Specify argument
    {it:wd} to use the trimmed Harrell-Davis estimator as suggested by 
    Akinshin (2021).

{phang2}
    {it:def}=11: Mid-quantile by Ma, Genton, and Parzen (2011). The quantile
    is obtained by linear interpolation of the empirical mid-distribution
    function at unique values of X. The mid-distribution
    function is defined as F_mid(x) = Pr(X<=x) - 0.5*Pr(X=x).

{pmore}
    Definition 2 is the default definition used by {cmd:mm_quantile()},
    {cmd:mm_iqrange()}, and {cmd:mm_median()}. Definition 2 is also the
    definition that is used by Stata commands {helpb summarize} and
    {helpb _pctile}, whereas Stata command {helpb centile} (as well as
    {helpb _pctile} with option {cmd:altdef}) uses definition 6. Note that
    quantile functions based on definitions 0-3 are discontinuous; quantile
    functions based on definitions 4-11 are continuous (but note that quantile
    functions according to definitions 4-9 may have flat regions). Based on an
    analysis of various properties of the different definitions, Hyndman and
    Fan (1996) suggest definition 8 as the best choice. However, note that
    definition 5 is the only definition that satisfies all 6 properties
    considered by Hyndman and Fan (1996). Moreover, the Harrell-Davis estimator
    (definition 10), which has not been considered in the analysis by Hyndman
    and Fan (1996), may have better small-sample properties than the other
    estimators. Furthermore, Ma et al. (2011) suggest definition 11 for
    data with heaping or discrete data.

{pmore}
    If weights are all equal to 1, definitions 1-9 are equivalent to the
    quantile definitions provided by Hyndman and Fan (1996), definition 10 is
    as proposed by Harrell and Davis (1982), and definition 11 is as proposed
    by Ma et al. (2011). The extension to incorporate weights is
    straightforward for quantile definitions 0-2 and 11, but somewhat arbitrary
    for quantile definition 3. For quantile definitions 4-10, the extension to
    weighted data is based on the approach by Akinshin (2020).

{pstd}
Argument {it:fw}!=0 requests that the weights are to be treated as frequency
weights (this is only relevant for definitions 3-10; for definitions 0-2 and
11, the distinction between frequency weights and other types of weights is
irrelevant). If {it:fw}!=0 is specified, the quantiles are computed in a way
such that results from the weighted data are identical to the results you would
obtain from expanded data (i.e., from data in which the observations are
duplicated {it:w} times and the weights are set to 1; naturally, such an
analogy only makes sense if {it:w} is integer, although non-integer weights are
allowed). For definitions 4-10, argument {it:fw} also determines how the
"effective" sample size that determines the width of the integration window is
defined. If {it:fw}!=0, the sample size is set to the sum of weights (n = W);
if {it:fw} is omitted or if {it:fw}==0, the effective sample size is set to n =
W^2 / sum(w_j^2) (Kish's formula). If you want to use another effective sample
size, rescale the weights such that their sum is equal to the desired sample
size and then specify {it:fw}!=0 when calling {cmd:mm_quantile()}.

{pstd}
{cmd:mm_median()} and {cmd:mm_iqrange()} are wrappers for {cmd:mm_quantile()}
that return the median (the 0.5-quantile, using quantile definition 2) and the
inter-quartile range (IQR = 0.75-quantile - 0.25-quantile).

{pstd}
{cmd:_mm_quantile()}, {cmd:_mm_median()}, and {cmd:_mm_iqrange()} are
like {cmd:mm_quantile()}, {cmd:mm_median()}, and {cmd:mm_iqrange()}, but assume
that the data is sorted (in ascending order) and non-missing and that the
weights are non-negative and non-missing. The functions are fast, especially
in the unweighted case for definitions 0-9 (once the data is sorted,
unweighted quantiles of definition 0-9 can be taken in practically no time;
Harrell-Davis quantiles are more expensive to compute). However, the functions
will return invalid results if applied to unsorted data (or if any of the other
assumptions is violated). Only a single column of data is allowed as input to
these functions.


{title:Example}

    {com}: x = rnormal(10000, 1, 0, 1)
    {res}
    {com}: mm_quantile(x, 1, (0.25 \ 0.5 \ 0.75))
    {res}       {txt}           1
        {c TLC}{hline 16}{c TRC}
      1 {c |}  {res}-.6724194826{txt}  {c |}
      2 {c |}  {res} .0005707902{txt}  {c |}
      3 {c |}  {res}  .677781255{txt}  {c |}
        {c BLC}{hline 16}{c BRC}

    {com}: mm_median(x, 1), mm_iqrange(x, 1)
    {res}       {txt}          1             2
        {c TLC}{hline 29}{c TRC}
      1 {c |}  {res}.0005707902   1.350200738{txt}  {c |}
        {c BLC}{hline 29}{c BRC}{txt}


{title:Conformability}

    {cmd:mm_quantile(}{it:X}{cmd:,} {it:w}{cmd:,} {it:P}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}{cmd:)}:
             {it:X}:  {it:r1 x c1}
             {it:w}:  {it:r1 x} 1 or 1 {it:x} 1
             {it:P}:  {it:r2 x c2}
           {it:def}:  1 {it:x} 1
            {it:fw}:  1 {it:x} 1
            {it:wd}:  1 {it:x} 1
        {it:result}:  {it:r2 x c2} or {it:r2 x c1}

    {cmd:mm_median(}{it:X}{cmd:,} {it:w}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}{cmd:)}:
             {it:X}:  {it:r x c}
             {it:w}:  {it:r x} 1 or 1 {it:x} 1
           {it:def}:  1 {it:x} 1
            {it:fw}:  1 {it:x} 1
            {it:wd}:  1 {it:x} 1
        {it:result}:  1 {it:x c}

    {cmd:mm_iqrange(}{it:X}{cmd:,} {it:w}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}{cmd:)}:
             {it:X}:  {it:r x c}
             {it:w}:  {it:r x} 1 or 1 {it:x} 1
           {it:def}:  1 {it:x} 1
            {it:fw}:  1 {it:x} 1
            {it:wd}:  1 {it:x} 1
        {it:result}:  1 {it:x c}

    {cmd:_mm_quantile(}{it:x}{cmd:,} {it:w}{cmd:,} {it:P}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}{cmd:)}:
             {it:x}:  {it:r1 x} 1
             {it:w}:  {it:r1 x} 1 or 1 {it:x} 1
             {it:p}:  {it:r2 x c2}
           {it:def}:  1 {it:x} 1
            {it:fw}:  1 {it:x} 1
            {it:wd}:  1 {it:x} 1
        {it:result}:  {it:r2 x c2}

    {cmd:_mm_median(}{it:x}{cmd:,} {it:w}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}{cmd:)}:
             {it:X}:  {it:r x} 1
             {it:w}:  {it:r x} 1 or 1 {it:x} 1
           {it:def}:  1 {it:x} 1
            {it:fw}:  1 {it:x} 1
            {it:wd}:  1 {it:x} 1
        {it:result}:  1 {it:x} 1

    {cmd:_mm_iqrange(}{it:x}{cmd:,} {it:w}{cmd:,} {it:def}{cmd:,} {it:fw}{cmd:,} {it:wd}{cmd:)}:
             {it:x}:  {it:r x} 1
             {it:w}:  {it:r x} 1 or 1 {it:x} 1
           {it:def}:  1 {it:x} 1
            {it:fw}:  1 {it:x} 1
            {it:wd}:  1 {it:x} 1
        {it:result}:  1 {it:x} 1


{title:Diagnostics}

{pstd}
Evaluation probabilities below 0 and above 1 (including missing) are allowed,
resulting in the observed minimum or maximum, respectively.

{pstd}
{cmd:mm_quantile()}, {cmd:mm_median()}, and {cmd:mm_iqrange()} return error if
{it:X} contains missing values or if {it:w} contains negative values or missing
values (zero weights are allowed, but will be omitted from the
computations). Missing is returned if {it:X} is void. Void is returned if {it:P} is void.

{pstd}
{cmd:_mm_quantile()}, {cmd:_mm_median()}, and {cmd:_mm_iqrange()} assume {it:x} to be
non-missing and sorted (in ascending order) and assume {it:w} to be non-missing
and non-negative. The functions return invalid results if these assumptions
are violated. Missing is returned if {it:x} is void. Void is returned if {it:p} is void.


{title:Source code}

{pstd}
{help moremata_source##mm_quantile:mm_quantile.mata},
{help moremata_source##mm_median:mm_median.mata},
{help moremata_source##mm_iqrange:mm_iqrange.mata}


{title:References}

{phang}
    Akinshin, A. (2020). Weighted quantile
    estimators. {browse "http://aakinshin.net/posts/weighted-quantiles/"}.
    {p_end}
{phang}
    Akinshin, A. (2021). Trimmed Harrell-Davis quantile estimator based on the
    highest density interval of the given 
    width. {browse "http://arxiv.org/abs/2111.11776":arXiv:2111.11776} [stat.ME].
    {p_end}
{phang}
    Harrell, F.E., C.E. Davis (1982). A New Distribution-Free Quantile
    Estimator. Biometrika 69: 635-640.
    {p_end}
{phang}
    Hyndman, R.J., Y. Fan (1996). Sample Quantiles in Statistical
    Packages. The American Statistician 50:361-365.
    {p_end}
{phang}
    Ma, Y., M.G. Genton, E. Parzen (2011). Asymptotic properties of sample
    quantiles of discrete distributions. Annals of the Institute of Statistical
    Mathematics 63:227–243.
    {p_end}


{title:Author}

{pstd}
Ben Jann, University of Bern, ben.jann@unibe.ch


{title:Also see}

{p 4 13 2}
Online:  help for
{bf:{help mf_mm_hdq:mm_hdq()}},
{bf:{help mf_mm_ecdf:mm_ecdf()}},
{bf:{help summarize}}, {bf:{help pctile}}, {bf:{help centile}},
{bf:{help moremata}}
